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Empirical best linear unbiased prediction (EBLUP) method uses 
a linear mixed model in combining information from different sources 
of information. This method is particularly useful in small area prob- 
lems. The variability of an EBLUP is traditionally measured by the 
mean squared prediction error (MSPE), and interval estimates are 
generally constructed using estimates of the MSPE. Such methods 
have shortcomings like under-coverage or over-coverage, excessive 
length and lack of interpretability. We propose a parametric boot- 
strap approach to estimate the entire distribution of a suitably cen- 
tered and scaled EBLUP. The bootstrap histogram is highly accurate, 
and differs from the true EBLUP distribution by only 0(d 3 n~ 3/2 ), 
where d is the number of parameters and n the number of observa- 
tions. This result is used to obtain highly accurate prediction inter- 
vals. Simulation results demonstrate the superiority of this method 
over existing techniques of constructing prediction intervals in linear 
mixed models. 

1. Introduction. Large scale sample surveys are usually designed to pro- 
duce reliable estimates of various characteristics of interest for large geo- 
graphic areas. However, for effective planning of health, social and other 
services, and for apportioning government funds, there is a growing demand 
to produce similar estimates for smaller geographic areas and for other sub- 
populations. To meet this demand, it is necessary to supplement the survey 
data with other relevant information that is often obtained from different 
administrative and census records. In many small area applications, mixed 
linear models are now routinely used in combining information from various 
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sources and explaining different sources of errors. These models incorporate 
area specific random effects which explain the "between small area varia- 
tions," not otherwise explained by the fixed effects part of the model. 

For a good review on small area and linear mixed model research, the read- 
ers are referred to the book by Rao (2003), and two recent review papers by 
Rao (2005) and Jiang and Lahiri (2006). Several other applications of linear 
mixed models may be found in McCulloch and Searle (2001). Point pre- 
diction using the empirical best linear unbiased predictor (EBLUP) and the 
associated mean square prediction error (MSPE) estimation have been stud- 
ied extensively. See Jiang, Lahiri and Wan (2002) , Rao (2005) and Jiang and 
Lahiri (2006) for a review on the subject, especially on the latest develop- 
ment on resampling methods for MSPE estimation. However, little progress 
has been made outside the basic study of the first two moments, for exam- 
ple, on the properties of quantiles (central or tail) of predictors, or on the 
effect of high dimensionality of the parameters. 

For example, research on interval estimates in small area studies is typ- 
ically limited to some special cases of the Fay-Herriot model (described 
in detail in Section 2), where the traditional estimates are of the form 
EBLUP ± z a /2^/rnspe. Here mspe is an estimate of the true MSPE of the 
EBLUP, and z a /2 is the upper 100(1 — q/2)% point of the standard normal 
distribution. The coverage probabilities of such intervals may converge to 
the nominal level 1 — a; but the intervals are not efficient, in the sense they 
have either under-coverage or over-coverage problem, depending on the par- 
ticular choice of the MSPE estimator. More precisely, the coverage error of 
such interval is of the order 0{n~ l ) or higher, which is not accurate enough 
for most applications of small area studies, many of which involve small 
sample size n. 

In this paper we address the problem of approximating the distribution of 
a predictor, and applying it to obtain prediction intervals, in a very general 
framework of linear mixed models. We consider the following model from 
Das, Jiang and Rao (2004): 

(1.1) Y n = X/3 + Zv g + e n , 

where Y n 6 IR n is a vector of observed responses, X nxp and Z nX(? are known 
matrices and v q and e n are independent random variables with dispersion 
matrices D q (ip) and R n {ip), respectively. Here (3 S W and ^Sf* are fixed 
parameters. 

The mixed ANOVA model, and the longitudinal models including the 
Fay-Herriot model and the nested error regression model are special cases 
of (1.1). We can consider both balanced and unbalanced lay-outs in the above 
framework. In addition, we develop our theory and methodology allowing 
for the parameter dimension d = p + k to grow with sample size n. Dimen- 
sion dependent asymptotics are extremely important in the current context, 
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since many small areas may have sample sizes comparable to dimensions of 
the regression and variance components parameters; see, for example, Jiang 
(1996) for their use and importance in linear mixed models. 

Our approach toward approximating the distribution of a predictor is to 
employ parametric bootstrap. We concentrate on the empirical best linear 
predictor (EBLUP), owing to its wide popularity and use. We establish in 
Theorem 3.1 that the bootstrap histogram incurs an error of 0(d 3 n~ 3 / 2 ) 
in approximating the distribution of a centered and scaled EBLUP. For 
estimating the distribution of centered and scaled estimators, under stan- 
dard regularity conditions and fixed d, the normal approximation based on 
the central limit theorem has an error of 0(n -1 / 2 ), but the bootstrap can 
achieve higher-order accuracy, with typical approximation error of 0(?i~ 1 ). 
Theorem 3.1 may be seen as an extension of this higher-order accuracy 
phenomenon, in the context of prediction. Although our motivation and ter- 
minology comes from small area context, our bootstrap methodology and 
theoretical results are directly applicable to other usages of mixed linear 
models. 

There are several potential applications of a highly accurate approxima- 
tion of the entire distribution of the EBLUP. For example, it may be used to 
obtain (a) bagging predictors, (b) computing mean squared errors or other 
risks, (c) hypothesis testing, (d) calibration of traditional estimators, and 
(e) prediction interval construction. In this paper, we concentrate on the last 
application, since prediction intervals combine features of both point predic- 
tion and hypothesis testing nicely, and have not been extensively explored 
in small area or other mixed linear model contexts. 

Prediction intervals are useful in small area studies in several ways. For 
example, prediction intervals may help establish if different counties have 
similar resources and needs, or if different ethnic or other subpopulation 
groups are equally exposed to a particular disease. Our simultaneous con- 
centration on dimension asymptotics is also relevant. It has long been recog- 
nized that health, economic activity and other measures of human well-being 
depend on a number of exogenous and endogenous factors, many of which 
must be measured at the individual level and incorporated in the model. In 
statistical terms, this translates to high dimensionality of (5 and ip. 

In Section 2, we review some of the existing techniques for predictor distri- 
bution approximation and interval estimate construction. We pay special at- 
tention to the usage of resampling in such approximations/constructions. For 
prediction intervals, available literature is heavily concentrated on special 
cases of the Fay-Herriot model. Since traditional intervals perform poorly 
in terms of coverage or length or both, many attempts have been made to 
fine tune and calibrate them, often using resampling. To the best of our 
knowledge, approximation of the entire distribution of a predictor has not 
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been attempted in general small area problems, and we briefly review the 
related research for independent data. 

In Section 3, we present our bootstrap algorithm for the Das, Jiang and 
Rao model (1.1). Our main result is that the sup-norm distance between 
the distribution of EBLUP and its bootstrap approximation is 0(d 3 n~ 3 ^ 2 ). 
A direct corollary is that the bootstrap prediction interval has coverage 
accuracy of 0(d 3 n" 3 / 2 ). Note that our proposed prediction interval is a 
bootstrap interval, which is different from the traditional approaches of ob- 
taining asymptotic intervals first and then calibrating it. Our interval can 
be calibrated one or more times to achieve coverage accuracy of 0(d 5 n" 5 / 2 ) 
or higher, if needed. 

We performed several simulation experiments in order to study how our 
percentile bootstrap interval estimate compares with existing techniques. A 
sample of these studies are reported in Section 4. The main message from 
the simulations is that the prediction intervals resulting from the proposed 
parametric bootstrap perform considerably better than the traditional tech- 
niques, which is a reflection of the high order accuracy theoretically estab- 
lished in Section 3. 

2. A review of predictor distribution approximation and interval con- 
struction. 

2.1. Approximating distributions of predictors. Considerable theoretical 
research has been carried out in the prediction of a random variable that is 
independent of Y n , and has density £(-\/3,ip). In terms of expected Kullback- 
Leibler divergence, the naive plug-in predictor density £(-|/3,V>) performs 
poorly compared to Bayesian predictors J £(-|/3, i/j)tt(/3, ip\ Y n ), see for exam- 
ple, Aitchison (1975), Murray (1977), Ng (1980), Komaki (1996, 2001, 2006) 
and George, Liang and Xu (2006). Harris (1989) showed that the bootstrap 
predictor 

(2.1) £*(•) = Ja-\s,t)dC*(s,t) 

also performs better than the naive plug-in predictor. Recently, Fushiki, 
Komaki and Aihara (2004) have shown that the bootstrap predictor (2.1) 
is asymptotically equivalent to a Bayesian predictor with Hartigan's M- 
prior. The M-prior has certain optimality properties which may be found in 
Hartigan (1964, 1998). In a related work, Fushiki, Komaki, Aihara (2005) 
show that the Harris predictor is related to bagging of Breiman (1996). 

In small area or other mixed linear model contexts, the random variable 
of interest depends on Y n , unlike the framework described above. Also, 
performance measures other than expected Kullback-Leibler divergence may 
be of interest, for example, length and coverage of prediction intervals. 
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2.2. A review of interval estimation techniques. For a general mixed lin- 
ear model, Jeske and Harville (1988) proposed a prediction interval for a 
mixed effect, but did not include the effect of estimated unknown variance 
components on the accuracy of their proposed interval. 

Jiang and Zhang (2002) used a distribution-free method for constructing 
prediction intervals for a future observation under a non-Gaussian linear 
mixed model, based on the theory developed by Jiang (1998). This tech- 
nique does not employ any area specific information and can be useful in 
constructing intervals when there is no survey data on the response variable. 
Jiang and Zhang (2002) proposed another method which can be applied to 
the situation when the sample size is large within each area. This is a tech- 
nique of first obtaining the EBLUP for the random effects and the residuals. 
Then, under conditions sufficient to imply that the number of times each 
random effect is repeated (i.e., number of observations in each small area) 
tends to infinity, the empirical distribution of random effects as well as the 
residuals converge appropriately. This technique fails when we do not have 
large samples for each small situation that is common in many small 

area applications. 

Recently, Hall and Maiti (2006b) have studied parametric bootstrap for 
general mixed models in several aspects, including interval estimation. A 
review of their approach toward interval estimation may be found in Rao 
(2005). In Section 3, we discuss in detail how their model, results and asymp- 
totics differ from ours. 

Other than the above three papers, research on small area prediction 
intervals is largely concentrated on special cases of the Fay-Herriot model, 
described below: 

1. Conditional on = (6q, . . . , 9 n ) T , Y n = (Y\, . . . , Y n ) T follows a n-variate 
normal distribution with mean 9 and dispersion matrix D with known 
diagonal entries Di > and off-diagonal entries 0. Here (and in the sequel) 
all vectors are taken to be column vectors, for any vector (matrix) a (A), 
the notation a T (A T ) denotes its transpose. 

2. The variable follows a n-variate normal distribution with mean X/3 for 
a known n x p matrix X and unknown but fixed vector (3 G W . The dis- 
persion matrix is AI n , where the matrix \ n is the n dimensional identity 
matrix and A is an unknown constant. 

There are several options for constructing interval estimates for 6i = 
x[(3 + Vi. One may use only the Level 1 model for the observed data, or 
only the Level 2 model for the borrowed strength component, or a combina- 
tion of both. The interval for 9i based only on the Level 1 model is given by 
if (a) : Yi ± z a / 2 Dy 2 , where z a / 2 is the (1 — a/2)th standard normal quan- 
tile. Obviously, for this interval, the coverage probability is 1 — a. However, 
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it is not efficient, since its average length is too large to make any reasonable 
conclusion. This is due to the high variability of the point predictor Yi. 

An interval based only on Level 2 ignores the crucial area specific data 
that is modeled in Level 1, and hence falls short on two counts: it fails 
to be relevant to the specific small area under consideration, and it fails 
to achieve sufficient coverage accuracy. A small example given later in this 
section demonstrates this latter property. 

Thus, interval estimation techniques that combine both levels of the Fay- 
Herriot model are required. A popular approach is to employ empirical Bayes 
methodology. Cox (1975) proposed the following empirical Bayes interval: 

If {a) : (1 - Bi)Yi + B^fi ± z a/2 Dl /2 (l - B^ 2 , 

where Bi and are estimators of Bi = Dij '{A + Di) and /3, respectively, 
and jc[ is the ith row of X. Under standard regularity conditions, P(0j £ 
if (a)) = 1 — a + 0(n _1 ), where P denotes a probability measure induced 
by the joint distribution of Level 1 and Level 2. Thus, this prediction inter- 
val attains the desired coverage probability asymptotically, but the cover- 
age error is of order 0(n _1 ), which is not accurate enough for many small 
area applications. This lack of accuracy may partially be due to the addi- 
tional variability resulting from estimation of (3 and A. Currently, MSPE 
estimators are available in several mixed linear models, see for example, 
Jiang Lahiri and Wan (2002), Datta, Rao and Smith (2005), Hall and Maiti 
(2006a). Naive empirical Bayes intervals constructed using EBLUP, MSPE 
estimators and standard normal quantiles typically have an error of 0{n~ l ) 
or higher. 

For a special case of the Fay-Herriot model with common mean and equal 
sampling variances Di = D, Morris (1983a) incorporated the additional un- 
certainty due to the estimation of the hyperparameters. However, Basu, 
Ghosh and Mukerjee (2003) showed that the resulting empirical Bayes in- 
terval proposed by Morris (1983a) still has coverage error of 0{n~ l ). They 
used analytical calibration of the Morris' interval to reduce the coverage 
error to o(n _1 ). They also showed that with suitable analytical approxima- 
tions in place, an interval due to Carlin and Louis (1996), page 98, and a 
new interval, have coverage error of the order o(re _1 ). Datta et al. (2002) 
used similar analytical calibration in a more general Fay-Herriot model, 
and obtained a prediction interval with coverage error of 0(n -3 / 2 ). Morris 
(1983b) considered a variation of his (1983a) work with the use of a hierar- 
chical Bayes type point estimator. Hill (1990) suggested a general framework 
which, in the Fay-Herriot setting matches with an exact hierarchical Bayes 
confidence interval. Datta et al. (2002) followed up Hill's idea to obtain an 
interval with coverage error of 0{n~ l ). 

Apart from the analytical approaches, calibration using different boot- 
strap techniques has been popular. The methods differ in the generation of 
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the bootstrap samples and the type of correction made. For a special case of 
the Fay-Herriot model where Y\, . . . ,Y n are independent and identically dis- 
tributed, Laird and Louis (1987) proposed three different resampling strate- 
gies: (a) usual nonparametric bootstrap by sampling with replacement from 
the data, (b) a semi-parametric method, assuming density at the first level 
of their two level model is known but that at the second level is unknown, 
and (c) the parametric bootstrap. In mixed linear models, the nonparamet- 
ric and semi-parametric bootstrap approximation of the distribution of the 
EBLUP are generally not consistent. Once the bootstrap sample (nonpara- 
metric, semi-parametric or parametric) is generated, the next challenge is to 
find a method that corrects the empirical Bayes confidence intervals if (a) 
to achieve better coverage. Laird and Louis (1987) considered an imitation 
of the hierarchical Bayes approach. 

Carlin and Gelfand (1990, 1991) point out that the hierarchical Bayesian 
methods like those of Laird and Louis (1987) lead to a lengthening of the 
empirical Bayes interval, which is not the same as a correction. They discuss 
an example where increasing the length further exacerbates the coverage 
bias. They suggest parametric bootstrap to calibrate the empirical Bayes 
interval. 

Calibration of intervals has been one of the major uses of bootstrap for 
some time, and can lead to considerable improvement of coverage accuracy. 
Coupled with use of bias correction, use of pivotal or nearly pivotal statistics, 
and Edgeworth corrections, improvements from calibration can sometimes 
be dramatic. See Abramovitch and Singh (1985), Beran (1990a, 1990b), 
the book by Efron and Tibshirani (1993) and references therein for further 
details on these issues. On the other hand, calibration is both time and 
computational effort intensive, often requiring iterative searches; it typically 
increases variability; the results often lack straightforward interpretability; 
and successive calibrating steps typically have diminishing returns in terms 
of improvement of coverage. It is not always clear what property of an inter- 
val, that is, length, coverage, end points or some other characteristic, ought 
to be calibrated, see for example, DiCiccio and Efron (1996) and the dis- 
cussions of it by Hall and Martin (1996), Lee and Young (1996); and the 
interesting example in the rejoinder. Some calibrating options do not exist 
for multivariate confidence or prediction regions. Asymptotic results suggest 
calibrated intervals have better coverage accuracy, but do not consider the 
variability induced by the calibration, do not represent performance in finite 
samples; or reflect the degree in which the finite sample results depend on 
unknown parameters and their estimators. Nevertheless, calibration is an 
excellent tool to improve coverage of intervals; though it seems sensible to 
use a more accurate interval and little or no calibration; rather than a less 
accurate interval with intensive calibration. The bootstrap interval we ob- 
tain in Section 3 is one such highly accurate interval, and requires the same 
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amount of computational effort as one round of bootstrap based calibration 
of Carlin and Gelfand (1990, 1991). 

Hall (2006) suggested an application of the nonparametric bootstrap con- 
fidence interval based on the generated #*'s only. In the small area context, 
this may be applicable when the differences between the small areas are 
minor, or carried only in the fixed effects. In surveys, robustness is always 
an important issue, and the practitioners are always interested in efficient 
nonparametric methods. However, due to scarce data at the small area level, 
nonparametric estimators tend to under-perform, often severely. This is be- 
cause the nonparametric models typically permit the generation of boot- 
strap histograms based on a synthetic model or the regression model, but 
do not permit approximation of the conditional distribution of 0i given the 
data Y n . As a result, the nonparametric bootstrap prediction interval for 
9i is likely to underweight the area specific data. Accurate weighting of the 
area specific data is important for achieving good coverage properties, as 
the example below shows. Hall (2006) also pointed out the importance of 
parametric bootstrap in small area estimation and other related problems. 

Example. Consider the following special case of the Fay-Herriot model 
where Oi = 1, and x.J/3 = \i. Thus, at Level 1, Y^s given the 0j's are inde- 
pendently distributed as N(8i, 1) random variables; and at Level 2, the 6i's 
are independent, identically distributed as N([i,t 2 ) random variables. The 
estimators of fj, and r 2 are given, respectively, by p, = Y, f 2 = max(0, s 2 — 1), 
where s 2 = J2{Vi~ 2/) 2 / ( n — 1 ) • Assume f 2 > , a condition that is satisfied in 
many problems. The bootstrap procedure would require us to generate 6* ~ 

N[fi, f 2 ] and Y*\9* ~ d N[6*, 1]. Then we have /t* = Y*, f T = max(0, s* 2 - 1) 
where s* 2 = J^ilJi ~ V*) 2 /{ n ~ !)• An obvious Level 2 based bootstrap pre- 
diction interval for Q{ that is not area specific, is given by 

(2.2) (fi-t 1 Vf 2 ,fi + t2Vf 2 ), 

where (ii,i2) are cutoff points satisfying P(/i* — 1\ \J f 2 * < 0* < fx* -\-t2^/r 2 *) = 
1 — a. 

It can be shown that interval (2.2) has coverage of 1 — a + 0(ra -1 / 2 ) which 
makes it consistent, but hardly accurate enough. The lack of accuracy is due 
to the use of the Level 2 distribution only, so that the Level 1 data Yi plays 
no special role in the interval construction. 

In Bayesian terminology, the Level 2 of the Fay-Herriot model essentially 
corresponds to a prior on 9i, while the Level 1 model yields the likeli- 
hood. Using only the "prior knowledge" (Level 2 distribution) does not even 
yield consistency in general. However, in some instances using the Level 2 
distribution in conjunction with bootstrap can have a calibration effect that 
obtains 0{n~ l l 2 ) consistency, as shown above. 
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3. Parametric bootstrap prediction interval for a general linear mixed 
model. We consider the model: 

(3.1) Y n = X/? + Zv g + e n , 

where X is a known (n x p) matrix, Z is a known (n x q) matrix, Y n £ IR n 
is the vector of observed data, (3 6 MP is a fixed but unknown parameter 
vector, and v q S M q and e n S M n are random variables following the nor- 
mal distributions N q (0,D q ) and N n (0,R n ), respectively. The integer q may 
depend on n, thus q = q n . Assume the sequence {v g } and {e n } are indepen- 
dent. The first term X/3 represents the fixed effects, and the second term 
Zvq the random effects. Thus X/3 + Zv q constitute the signal component 
of the observed data, while e n is the noise. The properties of the signal are 
of interest, which depend on the unknown parameters (3, D q and R n . 

Assume that the (q x q) matrix D q and the (n x n) matrix R n are 
known up to a (k x 1) vector of unknown parameters, thus D q = D q (ip) 
and R n = R n (ip) for a fixed but unknown ip = (tpi, . . . , ^) T E R fc . Note that 
the dispersion matrix of the observed data Y n is given by 

S n = E n ty) = RnW + ZD q (^)Z T . 

We henceforth drop the n from Y n , e n , R n and E n , and q from v q and 
D q to simplify notation. We take d = p + k, the dimension of the parameter 
space. Let 9 = ((3, ip) denote the unknown parameters. 

Das, Jiang and Rao (2004) show that several linear mixed models, in- 
cluding analysis of variance (ANOVA) models and longitudinal models of 
both balanced and unbalanced nature are special cases of the model (3.1). 
Unbalanced ANOVA models arise, for example, when R = o-Ql n ; and D = 
diag(crfl ri , . . . , af,_ l l rk _ 1 ) where I r is the r x r identity matrix. Here tp is the 
vector of variance components ip = (og, . . . , o^ ^. Unbalanced longitudinal 
models arise when £ has a block diagonal structure. 

Let T = c T (X/3 + Zv), where c is a fixed and known (n x 1) vector. The 
case where c is a n x m matrix obtains multidimensional predictive quanti- 
ties, and their treatment is similar to the univariate case described below, 
with some minor algebraic variations. We concentrate on univariate T for 
easier exposition. The conditional distribution of T given Y is N (/j,t , cr^) , 
where 

HT = c T X/3 + c T ZDZ T ^- 1 (Y - X/?) 

(3.2) 

= c T RY,- l X(3 + c T ZDZ T YT 1 Y 

and 



(3.3) 



a\ = c T Z(D - DZ T YT x ZD)Z T c. 
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Generally, (3 and ip (and hence D and R) are estimated from the data Y 
by using the marginal distribution of Y, given by N n (X./3, £). The resulting 
estimates (it and &t of the mean and variance of T are expressions similar 
to (3.2) and (3.3), with (3 and ip in place of (3 and ip. 

For algebraic simplicity, in the rest of this paper we assume that X is full 
column rank and use the estimator (3 = (X r X)~ 1 X- r Y. This is the ordinary 
least squares estimator of (3. Using other estimators like the maximum like- 
lihood estimator or the weighted least squares estimator, with appropriate 
conditions on the weights, is another possibility. This makes little difference 
in the asymptotic analysis as long as the weights are smooth functions of ip. 
Estimator ip of ip is typically obtained by maximum likelihood or restricted 
maximum likelihood techniques. 

Based on the fact that <7^ 1 (T — /xt) is a standard normal pivot, the tra- 
ditional approach to interval estimation for T, reviewed in Section 2, is to 
take (/It ± z^/mspe) for some estimator mspe of MSPE and the appropri- 
ate Normal quantile z. Unfortunately, <r^ 1 (T — /ty) is not a pivot, and the 
traditional approach produces too short or too long intervals. Let the dis- 
tribution of <r^ 1 (r — (It) be C n . Recognizing that C n is not the standard 
normal distribution, we propose to estimate it using parametric bootstrap. 

Define 

Y* = X/3 + Zv* + e* 

where v* ~ N q (0, D(ip)) and e* ~ N n (0, R{ip)) are independent of each other. 

From Y*, obtain (3* and ip* using the same techniques used to obtain (3 
and ip earlier. Next, obtain fi T and a T using (3* and ip* using (3.2) and (3.3). 
Define T* = c T (X/3 + Zv*). The distribution of 

a^*(T* -fL* T ), 

conditional on the data Y, is the parametric bootstrap approximation £* 
of C n . Using this approximation, we then proceed to obtain the interval 
estimate for T as {(it + qi^T^T + 92<5"t), where q\ and q2 are appropriate 
quantiles of the bootstrap approximation £* of C n . 

Our main result is that £* approximates C n up to 0(d 3 n~ 3 / 2 ) terms. In 
order to state the assumptions for our result, let us introduce some terminol- 
ogy and notation now. For any function f(ip) : M a — > M, f'(ip) denotes its first 
derivative written as a a x 1 column vector; f"(ip) denotes the ax a second 
derivative matrix. For a symmetric matrix A, A max and A m i n , respectively, 
denote its maximum and minimum eigenvalue. 

The following are the assumptions for our result in this section: 

1. The following relations hold: 

(3.4) ||X T c|| =0(1), 
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(3.5) ||X r S- 1 Z J DZ T c|| =0(1), 

(3.6) c T ZDZ T c = 0{l), 

(3.7) c T ZDZ T T,~ 1 ZDZ T c = 0(l). 
In addition, 

a\ = c T Z(D - DZ T YT x ZD)Z T c > M > 0, 

for some constant M > 0. 
2. Assume that 

1 2 



(3.8) sup £ 



l<i<?T. j = \ 



it 



1/2 

,a=l 



0(p/n) 



(3.9) A min (n- i X i X) > M > 0, 

for some constant M > 0. 
3. The eigenvalues of the matrices D and R lie in (L ,L) for some L > 1. 
The eigenvalues of D(ij)) and i?(V>) lie in (L~ 1 /2, 2L). The eigenvalues of 
£ lie in a compact set on the positive half of the real line. 
In the representations 

(3.10) D = D 1 (^)Df(^), D = D 1 (^)A D (4>)D^), 

(3.11) R = R 1 y>)E$(il>), R = R 1 {i>)A R {4,)R T l {i>) 1 

where A# and A/) are diagonal matrices, the following conditions are 
satisfied: 

All the entries of the q x q matrix Ajj = diag(A£>i, . . . , A# 9 ) and the 
n x n matrix Ar = diag(A^i, . . . , A# n ) have three bounded continuous 
derivatives. 

We denote by A^, the k x q matrix whose (j,i)th entries are given by 
d 

((A' D )) j ^) = — A Di (tP), j = l,...,k; i = l,...,q. 
The (j,i)th entry of the k 2 x q matrix A^ is 

ji,h = j = l,...,k 2 , i = l,...,q. 

The (j,i)th entry of the k 3 x q matrix A^ is 



12 S. CHATTERJEE, P. LAHIRI AND H. LI 

where j% + (j 2 - l)k + (j 3 - l)^ 2 = 3, ji,h,33 = l,---,k, j = l,...,k 3 , and 
i = l,...,q. 

We define the k x n matrix A^, the k 2 x n matrix and the k 3 x n 

(3) 

matrix A^ ; along identical lines as above. 
The following conditions are assumed: 



(3.12) 




= 0(1) 




(3.13) 


A max Af(V0A^(V0 


= 0(1) 




(3.14) 




= 0(1) 


J 


(3.15) 


A m axA'A T (V>)A£(VO 


= 0(1) 


3 


(3.16) 


A max Ag )T (^>gV) 


< M = 


O(l) 


(3.17) 


A max Ag )T (^)A^(^) 


< M = 


O(l) 



for some constant M > for all ^* in a neighborhood of the true value 

4. Let (S* = (k/n) l l 2 (ij) — iji). Assume that all the moments of ||5|| are 0(1). 
Moreover, the following relations are also satisfied: 



(3.18) 




= o( y 


/k/n), 


i = i,. 


(3.19) 


^s a s b 


= 0(\ 


/k/n), 


a, b = 1 


(3.20) 


ESj(Zv + e)i 




/k/n), 


j = h- 


(3.21) 


ES a S b (Zv + e)i 


= ou 


/k/n), 


a,b = l 



We now state our main theorem for this section. 



Theorem 3.1. Under the Assumptions (l)-(4), ifd 2 /n^0, we have 

(3.22) snp\£ n (q)-C* n (q)\=0 P (d 3 n- 3 / 2 ). 

The proof of Theorem 3.1 is given in the Appendix. A direct application 
of Theorem 3.1 is the following result on highly accurate prediction intervals. 

Theorem 3.2. Under the Assumptions (l)-(4) and d 2 /n^0, for any 
a G (0,1), if qi and q 2 are real numbers such that 

C* n (q 2 ) - C n {qi) = l-a, 

we have 

(3.23) F[fi T + qi&r <T<fi T + q 2 a T ] = 1 - a + 0(d 3 n~ 3 / 2 ). 
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Theorem 3.2 follows directly from Theorem 3.1, hence we omit its proof. 
Since the Fay-Herriot model (described in Section 2) is an important exam- 
ple, we state the results for it in a separate corollary below. 

Corollary 3.1. In the Fay-Herriot model, assume that the matrix X 
is full column rank, the diagonal entries ha of the projection matrix on the 
columns of X satisfy sup i ha = 0(p/n), the Level 1 variances {D{\ lie in a 
compact subset of (0, oo), and the estimator A of A is positive. Then, for any 
i € {1, .... n}, if ef B = (1 - Bi)Yi + Bv$p, 6f B * = (1 - B*)Y* + B?x?>, 
we have 

m £ 0f B + m D] /2 (l - + q l2 Dl /2 (l - A) 1/2 )] 

(3.24) 

= 1 -a + 0(p 3 ra _3/2 ); 
where Bi = Di/(A + A), and (^1,^2) satisfy 

F*[e* € (§f B * + qnD] l2 {l - Blfl\ef B * + q i2 D] /2 (l - B*) 1 ' 2 )} 
= l-a + P {p 3 n~ 3/2 ). 

The notation used in Corollary 3.1 are standard ones, that is, P* is 
the probability on the resampling scheme conditional on the data, B* = 
Di/(A* + Di), where (3* and A* are the estimators computed on the boot- 
strap data Y*. Here conditional on the data, 9* ~ N(x[j3,A), and Y*\9* ~ 
N(6*,Di) independently. Corollary 3.1 is easily derived from Theorem 3.2, 
and we omit the details of its proof. A slightly different approach to the 
same result may be found in the unpublished manuscript Chatterjee and 
Lahiri (2002). We now discuss the assumptions leading to our main result 
Theorem 3.1, and some additional features of our result. 

Remark 1 (On the dimension of the random effect vector). Note that 
the dimension q of the random effect v is arbitrary which may or may not 
depend on n. Owing to this generalization, our analysis is for T = c T (X.(3 + 
Zv), rather than the more traditional T = cjp + cjv. Since X is full column 
rank, the fixed effects in T and T are equivalent. 

Remark 2 (On the technical assumptions). In the development of all 
the assumptions above, we have preferred simplicity over generality. The 
requirement <i 2 n _1 — > is standard in dimension asymptotics. Assumption 1 
is in order to ensure T as a nontrivial quantity, that is, it ensures that 
both the fixed component and the variance of the random component of 
are O(l), and the variance a\ is bounded away from zero and infinity. By 
suitably scaling the norm of the vector c this assumption is satisfied. 
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Assumption 2 is a standard assumption on the behavior X. It ensures 
that the norm of each fixed effects covariate is of suitable order, and the 
fixed effects design is not singular. This assumption can be modified to suit 
cases where X is not full column rank, but such generalizations are routine. 

Assumption 3 is on standard differentiability and eigenvalue conditions. 
Here again, we have tried to adopt simple conditions rather than the most 
general ones. Note that the existence of the representations (3.10) and (3.11) 
are not part of the assumptions, and these representations will be established 
in the proof of Theorem 3.1. 

Also note that the eigenvalues of D(ip) and R{ip) are estimates of the vari- 
ance components in typical applications. Note that we do not allow these to 
be zero, since these must always lie in (L~ 1 /2,2L). However, L may be ar- 
bitrarily large, consequently this assumption does not limit the applicability 
of our results. 

In Assumption 4 we take all moments of S to exist in order to achieve 
simplicity. Our result involves computation of several terms involving S, 
and having all the moments of S available simplifies the algebra. In most 
applications, both ip and ip lie in a compact set, hence this is not a strong 
condition. The other moment conditions on S given by (3.18)-(3.19) are 
routine. These hold when tp is obtained using either maximum likelihood 
or restricted maximum likelihood formulation, see Jiang (1998) for related 
developments. 

Conditions (3.20)-(3.21) are interesting, since they effectively set a limit 
to the amount of dependency structure we can have in £. In order to visualize 
this, suppose ^~ i > is the estimator of ip obtained by using only those obser- 
vations that are independent of Y^; and let S^ - v = (/c/ra) 1 / 2 ^ -4 ) — ip). Then, 
a sufficient condition for E5j(Zv + e) { = 0{{k/n) 1 / 2 ) is that S - S (_i) = 
P ((k/n)V 2 ). 

This is routinely achieved, and in particular, if Y{ is independent of all 
but a finite number of observations, we have S- = P {{k/n) 1 / 2 ). This 
is the typical situation is almost all applications of small area studies. Thus, 
the effect of Assumption 4 is to restrict the complexity of the matrices D 
and R. 

Remark 3 (On the nature of prediction intervals). A prominent ap- 
plication of the highly accurate approximation of C n {-) by C^(-) is stated 
in Theorem 3.2, that is, in the construction of prediction intervals. Note 
that these are bootstrap intervals, as opposed to the traditional intervals 
described in Section 2, some of which are improved with bootstrap correc- 
tions. 

However, Theorem 3.2 does not describe the nature of the bootstrap 
prediction intervals, since the choice of q\ and q2 can be quite arbitrary. 
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These may be chosen to ensure either an equal tail property of the interval; 
whereby £* (qi) = a/2 and £* (qi) = 1 — a/2; or we may chose these accord- 
ing to a minimum length of interval property, that is, we minimize the 
length &t(Q2 — Qi)- The simulation experiments reported in Section 4 show 
that both equal tailed and minimum length bootstrap prediction intervals 
typically achieve the desired coverage accuracy without the use of elaborate 
calibrations; and the minimum length interval is always slightly shorter than 
the equal tailed one. 

Remark 4 (On multivariate prediction). Note that in place of the real 
valued T studied above, we could have a vector valued T with little change 
in methodology. The algorithmic and algebraic details are similar, and the 
main result of high order accuracy of distributional approximation (3.22) 
holds. The major difference between univariate and multivariate prediction 
is in the construction of prediction regions. Instead of the two points q\ and 
q2, we need to obtain probability concentration regions from the bootstrap 
distribution. Such regions can be obtained using various data depth notions 
and shape features, for example, as in Yeh and Singh (1997). This is a 
separate issue from the one addressed in this paper, and will be handled 
in a different paper. Note that multidimensional probability concentration 
regions can be quite hard to calibrate in practice. Some techniques, such as 
calibration of the end points of an interval, are not available in this case. 

Remark 5 (Asymptotics on total sample size n). One important feature 
of Theorem 3.1 is that the asymptotic limits are obtained with total sample 
size n tending to infinity. The total sample size n is the sum total of all 
observations made, counting each repeated measurement on each individual 
unit in each small area as a distinct observation. This allows Theorem 3.1 to 
be used with considerable flexibility, for example, when number of individual 
units in small areas are large, or when number of small areas are large, 
or both. However, requirements of asymptotic negligibility, as in (3.20)- 
(3.21), must still be met. Our assumptions are designed for the more realistic 
applications where number of small areas are large. 

In general, for mixed linear models asymptotic limits are obtained either 
when the number of observations in each small area tends to infinity, or 
when the number of small areas tend to infinity; see McCulloch and Searle 
(2001) and Rao (2003) for details. Theorem 3.1 is a breakthrough, owing to 
the greater flexibility it allows in asymptotics. 

Remark 6 (On area specific properties). The area-specific signal for 
each small area is of Tj = [3 + Z?V, conditional on the observed ith small 
area data Yj. Distributions of predictors for such area specific signals are 
effectively captured by our bootstrap predictive distribution approximation. 
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Consequently, our bootstrap prediction intervals for Tj are also area specific. 
Extensions to compare two or more small areas can be obtained by similar 
techniques, see comment on multidimensional prediction above. 

In the prediction interval described in Theorem 3.2, we have considered 
unconditional coverage, where probabilities are computed over the joint dis- 
tribution of Y and v. This establishes the performance of the area-specific 
interval that depends on v conditional on Y, as well as variability due to 
observations Y. 

Alternatively, one might compute the area-specific (random) coverage, 
which is defined as P[Tj £ ipj|Yj], where Ipi is the prediction interval. The 
interval proposed in Theorem 3.2 achieves Op(d 2 /n) order of area-specific 
coverage accuracy, since some smoothing effects arising from the distribution 
of Y are absent. This is no worse (and in some cases, better) than the area- 
specific coverage obtained by other techniques in special cases of the general 
linear model (3.1). 

Remark 7 (On calibration). Both the unconditional as well as the area- 
specific coverage can be improved by calibration. The use of calibration cou- 
pled with resampling is an active topic of research, and some discussion on 
this has been presented in Section 2. The coverage accuracy of the prediction 
interval of Theorem 3.2 can be improved to 0(d 5 n~ 5 ^ 2 ) with one round of 
calibration, and further still with more calibration. Such calibration may be 
done either on the probabilities corresponding to the two end points as in 
DiCiccio and Efron (1996), or on the true coverage of the interval. Some of 
our simulations, not reported in this paper, suggest that it is not always ben- 
eficial to attempt boosting the theoretical coverage probability, disregarding 
other properties of the interval. For example, variability of calibrated inter- 
vals are greater than uncalibrated ones, minimum length property is almost 
never preserved, and the results are quite dependent on the parameters and 
fixed constants of the problem. Hence, it seems reasonable to work with a 
good predictive distribution as in Theorem 3.1, instead of starting with a 
naive interval and embarking on intense iterative calibration. 

Remark 8 [The parallel work of Hall and Maiti (2006b)]. Recently, 
Hall and Maiti (2006b) studied parametric bootstrap methods for general 
small area models, and considering the overlap of the topics studied in their 
paper and this one, deserve special mention. For this comment, we use some 
notation from Hall and Maiti (2006b) whenever they are not in conflict with 
the notations in the rest of this paper, but use our notation otherwise. 

For a suitable function /«(/3) involving co-variates Xi = (Xa, . . . , Xi ni ) and 
parameter (3, they consider random effect Oj ~ Q(-; fi(/3), £), and conditional 
on 0j, the data Yy are independent observations from R(-;ip(@i),rji), for j = 
1, . . . ,rii, i = 1, . . . , m. Here ip(-) is a known link function, £ and r^'s are either 
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parameters or known constants, and Q(-) and R(-) are known probability 
distribution functions. They go on to study calibration of the mean squared 
prediction error (MSPE) and interval estimation with parametric bootstrap. 

Their model is broad enough to handle nonlinear mixed effects, which our 
model (1.1) cannot do. However, their assumption of Yij's being indepen- 
dent means that they do not consider longitudinal models, or other models 
with temporal or spatial dependence. This is essentially the case R being 
a multiple of the identity matrix in our set-up. Our model is broader than 
Hall and Maiti's in including several varieties of dependence structure. 

The interval estimate from their Section 2.8 is 

(3.25) I a = X ff3±z a/2 A 1 / 2 

for the Fay-Herriot model. Rao (2005) noted that this interval does not 
make use of the area-specific direct estimator, unlike the prediction interval 
proposed by Chatterjee and Lahiri (2002) . Hall and Maiti (2006b) calibrate 
this interval for better coverage accuracy, improving from their result 

(3.26) P[G; G I a ] = 1 - a + 0{m~ l ). 

The result (3.26) hold when the probability statement is on the marginal 
distribution of random effect Gj, and estimators (5 and A are independent 
of ith area data Yn , . . . , Yi ni ■ 

Our probability statements in Theorems 3.1 and 3.2 are, however, on the 
joint variability of the random effects and data (@i,Yn, . . . ,Yi ni ). Also note 
that Theorem 3.2 is obtained as n = J2iLi n % ~~ * 00 > while (3.26) is obtained 
as ?7i — > co. Since some of the raj's can be large, the speed of convergence 
toward the asymptotic limits are different; and m = o(n) if any rij — > oo. 

Hall and Maiti (2006b) obtain that if I a is calibrated once (twice), the 
coverage accuracy improves to 0(m~ 2 ) [0(m -3 )]. If the interval in (3.23) or 
(3.24) is calibrated once (twice), the coverage accuracy improves to 0(ra" 5 / 2 ) 
[0(n~ 7 / 2 )] when parameter dimension is fixed. 

In summary, Hall and Maiti (2006b) cover a wide ranging independent 
data framework, with careful MSPE estimation and marginal coverage of 
prediction intervals as number of small areas increases; while we consider 
deeper linear mixed framework allowing for longitudinal dependence, and 
establish results as total data size increases on the joint variability of random 
effects and data, thus also obtaining area specificity. 

4. A simulation example. In this section we compare the performance of 
our proposed parametric bootstrap with that of the traditional approaches, 
using a simulation study. We have carried out more extensive simulations 
which reflect the general pattern of performance reported here; the details 
are available from the authors. 
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Table 1 

Average coverage and average length of different intervals (nominal coverage = 0.95) in 

simulation pattern (a) 



Group 


Cox 


FH 


PR 


PB-ET 


PB-SL 


Gl 


83.1 (3.12) 


90.4 (3.57) 


92.4 (3.82) 


96.1 (4.50) 


95.7 (4.42) 


G2 


85.4 (2.14) 


93.7 (2.50) 


98.0 (3.19) 


96.2 (2.83) 


95.9 (2.79) 


G3 


85.8 (2.02) 


93.9 (2.36) 


98.0 (3.08) 


96.0 (2.65) 


95.6 (2.61) 


G4 


86.1 (1.89) 


94.3 (2.19) 


98.2 (2.93) 


96.1 (2.43) 


95.7 (2.39) 


G5 


89.7 (1.12) 


95.2 (1.23) 


97.3 (1.87) 


95.7 (1.28) 


95.3 (1.26) 



For the sake of comparability with existing studies, we adopt part of the 
simulation framework of Datta, Rao and Smith (2005) for our study. We 
consider the Fay-Herriot model with m = 15 and xJ"/3 = 0, and consider five 
groups of small areas with three areas in each group. Within each group, 
the .Dj's remain the same. There are two different patterns for the ZVs: (a) 
0.2, 0.4, 0.5, 0.6, 4.0 [this is pattern (c) of Datta, Rao and Smith (2005)] 
and (b) 0.4, 0.8, 1.0, 1.2, 8.0. For pattern (a), we took A = 1 in order to 
make the results comparable to Datta, Rao and Smith (2005). For pattern 
(b), we took A = 2 in order to make the variances twice that of pattern (a), 
but preserve the B{ = Di/{A + D{) ratios. 

We obtain all the results based on 10,000 simulation runs. The Prasad- 
Rao method-of-moments, and the Fay-Herriot method of estimating the 
variance component A are considered. Tables 1 and 2 report the simulated 
coverage probabilities and average lengths of several different prediction in- 
tervals (with nominal coverage 0.95) under patterns (a) and (b), respectively. 
We consider three prediction intervals of the type EBLUP ± 1.96-y/ mspe, 
where mspe is an estimator of the MSPE of EBLUP. The Cox interval, dis- 
cussed in Section 2, is obtained by using Prasad-Rao method-of-moment 
estimator of A. The Prasad-Rao (PR) interval estimator is obtained using 
that estimator of A along with the Prasad-Rao (1990) MSPE estimator. 
The Fay-Herriot (FH) interval estimator is obtained by using Fay-Herriot 



Table 2 

Average coverage and average length of different intervals (nominal coverage =0.95) in 

simulation pattern (b) 



Group 


Cox 


FH 


PR 


PB-ET 


PB-SL 


Gl 


85.5 (4.87) 


89.5 (5.18) 


89.3 (5.35) 


95.7 (6.55) 


95.4 (6.47) 


G2 


83.6 (2.68) 


86.0 (2.82) 


87.3 (2.93) 


95.2 (3.80) 


94.9 (3.75) 


G3 


83.4 (2.49) 


85.7 (2.60) 


86.8 (2.71) 


95.2 (3.53) 


94.9 (3.49) 


G4 


82.9 (2.27) 


85.0 (2.36) 


86.2 (2.46) 


95.0 (3.22) 


94.5 (3.18) 


G5 


83.0 (1.21) 


84.0 (1.23) 


84.8 (1.29) 


94.9 (1.72) 


94.6 (1.70) 
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method of moments estimator of A [see Fay and Herriot (1979)], and the 
MSPE estimator of EBLUP considered by Datta, Rao and Smith (2005) . 
Along with these three, we report two different parametric bootstrap pre- 
diction intervals. In both the methods, we used Fay-Herriot method of esti- 
mating A, and the weighted least squares estimator of (3. The first bootstrap 
interval is equal-tailed (PB-ET), and the second is the shortest length pre- 
diction interval (PB-SL). For both cases, we considered bootstrap sample 
of size 1000. 

The figures in Table 1 are average coverage probabilities and average 
lengths for each prediction interval method for pattern (a), average being 
taken over all three small areas within each group. Table 2 reports similar 
results for pattern (b). It is clear that the results depend on the pattern 
of DiS. The Cox prediction interval method consistently undercover. For 
pattern (a), both parametric bootstrap prediction interval methods perform 
better than the Prasad-Rao and Fay-Herriot prediction intervals in terms of 
coverage errors. In this case, the Fay-Herriot method interval always under- 
covers, while the Prasad-Rao method interval switches from undercoverage 
to considerable over-coverage. The Prasad-Rao and the Fay-Herriot meth- 
ods suffer from large undercoverage errors for pattern (b). In contrast, the 
performances of our parametric bootstrap methods remain stable over these 
two different patterns and always close to the target nominal level. Our 
minimum length parametric bootstrap method tends to provide shorter pre- 
diction intervals compared to the equal-tailed equivalents. 

These performance patterns are repeated for other sample sizes, and other 
patterns of Di values in our simulations. It is generally seen that an increase 
in the variances results in the traditional intervals performing even more 
poorly. Thus, while both the Prasad-Rao and the Fay-Herriot MSPE es- 
timators enjoy good theoretical properties, the resulting interval estimates 
suffer owing to the enforced symmetry and normality assumption. 

APPENDIX 

Proof of Theorem 3.1. We establish this result by obtaining an 
asymptotic expansion of C n (q). An identical expansion holds for C^(q), 
which leads to the result. In this proof, the letter capital C, with or without 
suffix, will be generic for constants. 

For the projection on the column space of X we use the notation P x , thus 

P x = X(X T X)~ 1 X T . 

Let </>(•) (3>(-)) be the standard Normal probability density (cumulative 
distribution) function. Let (j)' and <p" denote the first and second derivative 
of 4>{-)-, thus for iGlwe have cp'(x) = —x(fi(x),(j)"(x) = (x 2 — l)<fi(x). Define 

Q(q,Y) = a^ l {fi T - + q(&T ~ cr T )} ■ 
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Then for any q £ K, we have 



C n {q)=¥{a^{T-[i T )<q) 

= E[P(^ 1 (T -fir)<q + a^{fpr - + q{°T - °"t)}|Y)] 
= E[*(g + Q(g,Y))] 

= + 0(g)EQ(g, Y) - 2- 1 #(g)EQ 2 (g, Y) 



+ 2 _1 E^ / (q + Q- xf{x 2 - l)^(x) cfo 



= $( 9 ) + 0(9)T!( g ) - 2~ 1 #(( / )r 2 (g) +T 3 (g). 



Notice that for x G (g, g + Q), we have < \q + Q — x\ < \Q\ and (x 2 — 
1) x 4>{x) < 20(a/3), we have 



From the following calculations it will follow that EQ 8 = 0(d 8 n 4 ), whereby 



£(>). Note that Mr 1 + ZDZ T t- 1 - RT,- 1 - ZDZ T Y,~ X = almost surely. 
Hence we have 

p, T -H T = jRE^Xp + c T ZDZ T ±~ 1 Y - jRX^Xp - c T ZDZ T T,~ 1 Y 
= c T [I - ZL'Z T S~ 1 ]P x (Zv + e) 

+ ^(ZDZFt' 1 - ZDZ T T,~ 1 )(I - P x )(Zv + e) almost surely. 
In view of the above, let us write Q(q,Y) = Q\ + Q2(Y) + Qs(q,Y), where 
Q 1 = <Jt 1 c t \1 - ZL>Z T S- 1 ]P x (Zv + e), 
Q 2 (Y) = a^ l c T {ZbZ T t~ 1 - Z J DZ T E" 1 )(I - P x )(Zv + e), 

Qz(q, y ) = q^i^T - ctt)- 

First, using the decomposition 

Qi = a^JPxiZv + e) - C jy 1 c T Z J DZ T S- 1 P x (Zv + e) = Q n - Q 12 . 
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Using Assumption 1, in particular, (3.4)— (3.7), with some amount of algebra 
we can conclude that EQi = 0, EQf x = C^n" 1 ), EQ 8 U = 0(p 4 n" 4 ), EQ 2 12 = 
0{n~ l ), EQ 8 12 = 0(p i n- 4 ). 

We now analyze Q2(Y) and Q3(q, Y). These are considerably more com- 
plicated than Q\. We initially break down these two quantities in terms of 
more tractable variables Wi,...,Wn and remainder terms. The variables 
Wi,...,Wn depend on ZDZ T and S" 1 and their population equivalents. 
We need to compute the first, second, fourth, eighth and sixteenth moment 
of the W^s and show that the remainder terms are negligible. 

Toward that goal, our next step is to expand ZDZ T in equation (A.l) and 
in equation (A. 2) in terms of simpler matrices. Then we obtain asymp- 
totic expansions of the matrix entries, whereby at last we have sufficient 
ingredients for the moment computations of W\, . . . , W\\ and the remainder 
terms. We skip the details of the moment calculation algebra, of which there 
are several hundreds to compute. However, our assumptions are sufficient to 
establish the end result that EQ 2 (Y), EQf(Y), EQ 3 (q,Y) and EQ§(g,Y) 
are all 0(d 2 /n). The expansion of Q2(Y) is as follows: 

Q 2 (Y) = a^ 1 c r (ZDZ T t~ 1 - ZDZ T T,~ 1 )(I - P x )(Zv + e) 

= a^ 1 c T [{ZbZ T - ZDZ T )YT X 

+ ZDZ T {±- 1 - 5T 1 ) 

+ (ZDZ T - ZL>Z T )(S- 1 - £ _1 )](I - P x )(Zv + e) 
= W l + W 2 + W 3 . 

Now define W = o^ 2 {o\ — a\). We will simplify Q 3 (q, Y) in terms of W . 
However, we first need to simplify W . For this, we have 

W = a^ 2 [a\-a 2 T ) 

= a^ 2 c T [{ZDZ T - ZDZ T } 

- {zbz T t~ l zbz T - ZDZ T Y,- 1 ZDZ T }}c 

= <Jt 2 c t [{ZDZ t - ZDZ T } - ZDZ T {±~ 1 - YT X )ZDZ T 

- ZDZ T T,- 1 {ZDZ T - ZDZ T } - {ZDZ T - ZDZ T }T I ~ 1 ZDZ T 

- {ZDZ T - ZDZ T }T,~ 1 {ZDZ T - ZDZ T } 

- {ZDZ T - ZDZ T }(±~ 1 - YT X )ZDZ T 

- ZDZ^XT 1 - Y,- l ){ZDZ T - ZDZ T } 

- {ZDZ T - ZDZ T }(t~ 1 - YT x ){ZbZ T - ZDZ T }]c 
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Let us now simplify Qz(q, Y): 

Q 3 (q,Y) = qa^}{a T - a T ) = q(a^ 1 a T - 1) 

= q[W/2-W 2 /8 + r n \. 

At this stage, we use a result from Rao (1965), page 41, result (lc.3.10): If 
A is positive definite and B nonnegative definite n x n symmetric matrices, 
then, we can write A = Ya=i AiAj , = Ya=i biAiAj where A^s are vectors, 
bi's are real constants. Moreover, since A is positive definite, the Ai's form 
a basis (but not necessarily an orthogonal basis) of M. n . Another way of 
writing the same thing is A = AiAj, B = AiAbAJ where the columns of 
Ai are the A^s, and A R is a diagonal matrix with entries bi. Note that A\ 
is nonsingular. 

We use this result twice. First, we take R as A and R as B, and then we 
take D as A and D as B. Thus we have 

R = R 1 ( k if>)Rj{if>), R = R 1 ty)A R $)I$W>), 
D = D^Dltii), D = D^Ad^dJ^). 

Here the nonsingular matrices R\ and D± depend on the unknown parameter 
tp, while A R and Ap are diagonal matrices depending on the estimator ip. 
Based on the above, we have 

£ = R X \L + R{ 1 ZD 1 dJz t R^ t ]rJ = R 1 [I + AA T ]i?f , 

where A = R^ l ZD\. We define Bq = [I + AA T \~ 1 / 2 , the symmetric square 
root. Hence, = R^ T B 2 R^ 1 . We also have 

£ = ili [A fl + R^ 1 ZD 1 A D Dj Z t R^ t ]R,J = R 1 [A R + AA D A T ]Rj . 

Let us write A R = I + (k/n) 1/2 U R , A D = I + (k/n) 1/2 U D . Our next step is 
to write ZDZ T and £ _1 using JTr and C/d. Thus 

Z£)Z T = i?i,4A D A' r J Rf 

= ZDZ T + (k/n) 1/2 R 1 AU D A T Rj, 

(A.l) 

IT 1 = i?^ T [A fi + AAdA 7 "]- 1 ^ 1 

= R^ T B [I + {k/n) l l 2 U]- l B R-[\ 
where U = Bq(U r + AUdA t )Bq. We further simplify S" 1 by writing 
[I+^/n) 1 / 2 ^]- 1 

= 1- {k/n) l ' 2 U+ (k/n)U 2 - {k/nfl^l+ik/nf^UY 1 ^ 
= I-(k/n) 1/2 U+(k/n)U 2 - (k/n) 3/2 U R . 
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Hence, 

XT 1 = R^ T B [I + {k/nf^U^B^ 1 , 

(A.2) = 5T 1 - {k / n) 1/2 R{ T BqU BqR^ 1 + (k/n)R^ T B U 2 B R^ 1 

- (k/nf/ 2 R^ T B U R B R^. 

Equations (A.l) and (A.2) will be heavily used in the analysis below. 

We now turn our attention to Ur and Ud- Recall that S = k 1 ' 2 ^ 1 ' 2 ^ — 
ip). Suppose Xm is the ith element of either Ad- We have Adi(V') = 1- Thus, 
we have 

= 1 + (k/n) 1 / 2 S T X' Di + 2- l {k/n)S T \ Di S 

+ Q~\k/nf' 2 &(ji,j2,...,j3;\Di(r))S jl S h S h , 

31,32,33 

where ip* is a point between ip and i/j. Hence, we have 
(k/n^Uoi = {\ Di (4>) - 1) 

= (k/n) 1 l 2 S T \' Dl + 2~ l (k/n)S T \" m S 

+ 6- 1 (fc/n) 3/2 E &tii,h,-.-,33-ADi(r))S h S j2 S j3 

31,32,33 

= {k/n) l / 2 U Dil + (k/n)U Di2 + {k/nf/ 2 U DiZ . 

A similar analysis holds for Ujh. 

It now remains to calculate the first, second and eighth moments of 
W u ■ ■ ■ , Wn, and establish that EWf = 0(k ie n - 8 ), EW 2 = 0{k 2 /n), EWi = 
0(k/n) for i = 1, ...,11. Some of these moments turn out to be of even 
smaller order and thus contribute negligibly. Also, certain remainder terms 
have to proved negligible. The totality of these computations involve a few 
hundred algebraic manipulations, and is reasonably routine. We sketch part 
of the computation for one of the components of W\ as an example of the 
technique used. The rest of the computations are omitted. 

Using (A.l) we obtain that 

Wi = Ot 1 c t {ZDZ t - ZDZ T )T,- 1 (I - P x )(Zv + e) 

= (k/n) l,2 OT l c T ZD 1 

x [U D1 + (k/n) 1 / 2 U D2 + (k/n)U D3 ]D 1 Z T ^ 1 (l - P x )(Zv + e) 
= W n + W 12 + W 13 . 
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Here Upj, J = 1,2,3 is the diagonal matrix whose ith diagonal entry is 
Upij computed earlier. We will sketch the computations only for 

W n = {kjn) l l 2 o^ 1 c r T,D A U D xD x 77Yr l {l - P x )(Zv + e). 

Recall that Ujjh = S T X' Di . Let the jth element of the M. k dimensional 
vector Xi)i be X' Di p j = 1, . . . , k, i = 1, . . . , q. Define the (k x q) matrix A^, 
whose (a, b)th element is X' Dba . Also define the diagonal matrix E3 whose 
ith diagonal entry is the ith element of the vector D\Z T c. Then note that 
W n = (k/n) 1 / 2 a^ 1 S T E 5 (Zv + e), where U ~ iV n (0,I n ), S depends on U, 
and E§ = A'j^E^DiZ^T,^ 1 (I — P x ). The appropriate moment properties of 
Wn now follow by applying (3.20), (3.21) and (3.12). 

The above sketch of calculations for Wn may be repeated with variations 
for the other terms as well to establish that 

£n(q) = <%) + tfn-^faM + 0(k 3 n^ 2 ), 

for a O(l) smooth quantity 7(-,v)- 

A similar representation holds for £* (g) with f3 and ij) in place of (3 and 
ip. This establishes the result. □ 
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